{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "source": [
    "AutoML通过若干次试验来搜索最优的模型及其超参数组合。在实验结束，除了返回最优的模型外，还可以对性能排在前k的模型进行集成学习，从而提升pipeline的泛化性。\n",
    "\n",
    "在HyperTS中，我们引入了一种名为GreedyEnsemble[1]的集成方法来进行模型集成。它的具体执行过程如下："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. 初始化空的集成栈；\n",
    "\n",
    "2. 在topk的模型库中的模型依次添加到集成栈中，并在验证集上评估误差从而最大化集成栈的性能；\n",
    "\n",
    "3. 重复步骤2以固定迭代次数或者直到所有模型均被使用。\n",
    "\n",
    "4. 计算集成栈中各个模型的权重，然后返回在验证集表现最佳的嵌套集成模型。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "参考文献\n",
    "\n",
    "[1] Caruana, Rich, et al. \"Ensemble selection from libraries of models.\" in ICML. 2004."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**使用示例:**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1. 准备数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hyperts.datasets import load_network_traffic\n",
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = load_network_traffic(univariate=True)\n",
    "train_data, test_data = train_test_split(df, test_size=168, shuffle=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. 创建实验并运行它"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hyperts import make_experiment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "设置参数 ``ensemble_size`` 去控制集成模型的数量。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "experiment = make_experiment(train_data=train_data.copy(),\n",
    "                             task='forecast',\n",
    "                             mode='dl',\n",
    "                             timestamp='TimeStamp',\n",
    "                             covariates=['HourSin', 'WeekCos', 'CBWD'],\n",
    "                             forecast_train_data_periods=24*12,\n",
    "                             max_trials=10,\n",
    "                             ensemble_size=5)\n",
    "model = experiment.run()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<bound method Pipeline.get_params of Pipeline(steps=[('data_preprocessing',\n",
       "                 TSFDataPreprocessStep(covariate_cleaner=CovariateTransformer(covariables=['HourSin',\n",
       "                                                                                           'WeekCos',\n",
       "                                                                                           'CBWD'],\n",
       "                                                                              data_cleaner_args={'correct_object_dtype': False,\n",
       "                                                                                                 'int_convert_to': 'str'}),\n",
       "                                       covariate_cleaner__covariables=['HourSin',\n",
       "                                                                       'WeekCos',\n",
       "                                                                       'CBWD'],\n",
       "                                       covariate_cleaner__data_cleaner_args={'correct_object_dtype': False,\n",
       "                                                                             'int_convert_to': 'str'},\n",
       "                                       covariate_cols=['HourSin', 'WeekCos',\n",
       "                                                       'CBWD'],\n",
       "                                       ensemble_size=5, freq='H',\n",
       "                                       name='data_preprocessing',\n",
       "                                       timestamp_col=['TimeStamp'],\n",
       "                                       train_data_periods=288)),\n",
       "                ('estimator',\n",
       "                 TSGreedyEnsemble(weight=[0.6, 0.0, 0.4, 0.0, 0.0], scores=[-2.410948636898898, -2.042171581044738, -1.9751808480925728, -1.9869096606536953, -1.991527303840764]))])>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model.get_params"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. 推理并评估"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Metirc</th>\n",
       "      <th>Score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>mae</td>\n",
       "      <td>2.5579</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>mse</td>\n",
       "      <td>15.8978</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>rmse</td>\n",
       "      <td>3.9872</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>mape</td>\n",
       "      <td>0.4066</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>smape</td>\n",
       "      <td>0.3483</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Metirc   Score\n",
       "0    mae  2.5579\n",
       "1    mse 15.8978\n",
       "2   rmse  3.9872\n",
       "3   mape  0.4066\n",
       "4  smape  0.3483"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_test, y_test = model.split_X_y(test_data.copy())\n",
    "forecast = model.predict(X_test)\n",
    "results = model.evaluate(y_true=y_test, y_pred=forecast)\n",
    "results"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
